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ABSTRACT 


Online learning has attracted a large number of partici- 
pants and is increasingly becoming very popular. How- 
ever, the completion rates for online learning are notori- 
ously low. Further, unlike traditional education systems, 
teachers, if any, are unable to comprehensively evaluate the 
learning gain of each student through the online learning 
platform. Hence, we need to have an effective framework 
for evaluating students’ performance in online education sys- 
tems and to predict their expected outcomes and associated 
early failures. To this end, we introduce Deep Online Per- 
formance Evaluation (DOPE), which first models the stu- 
dent course relations in an online system as a knowledge 
graph, then utilizes an advanced graph neural network to 
extract course and student embeddings, harnesses a recur- 
rent neural network to encode the system’s temporal student 
behavioral data, and ultimately predicts a student’s perfor- 
mance in a given course. Comprehensive experiments on 
six online courses verify the effectiveness of DOPE across 
multiple settings against representative baseline methods. 
Furthermore, we perform ablation feature analysis on the 
student behavioral features to better understand the inner 
workings of DOPE. The code and data are available from 


https: //github.com/hamidkarimi/dope 
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1. INTRODUCTION 


Online learning has higher dropout and failure rates than 
traditional education systems. For instance, the completion 
rates of Massive Open Online Courses (MOOCs), an exten- 
sion of online learning technologies, are low (0.7%-52.1%, 
with a median value of 12.6%, reported by [20}). We also 
see similar situations in other online courses from univer- 
sities such as Open University in the UK and China [19]. 
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Figure 1: Visual comparison of the learning/intervention 
process between online and in-person education systems. 


Furthermore, since students typically drop out early in the 
courses [33], the platform is desired to detect which stu- 
dent is likely to drop out (or fail) as early as possible to 
intervene and to hopefully prevent these negative outcomes. 
Then the question is how we can assess students’ perfor- 
mance and detect those who are likely to drop out or fail in 
an online course. To answer this question, we first need to 
take a closer look at the online learning system and see how 
it differs from traditional learning. 


As illustrated in Figure }1] (right side), in the traditional 
learning setting, instructors can interact with students, as- 
sess their performance, and take action to provide interven- 
tion if they sense a student is likely to perform poorly in 
the class. In online learning systems, however, the students 
primarily interact with the online platform, so we face a 
setting depicted in the left side of Figure In this set- 
ting, there is inherently less interaction between students 
and instructors. More specifically, due to the high student- 
teacher ratio, teachers, if any, in the online learning systems 
are unable to comprehensively evaluate the learning gain 
of each student. Thus, we seek to develop a methodology 
that can harness the interactions of students with an online 
platform and accurately predict the course outcome (e.g., 
pass or fail). Such a system could then be used in real-time 
throughout the course to identify the students who are pre- 
dicted to perform poorly and provide some intervention to 
them with the limited resources that are inherent in online 
systems. 


Given the above discussion, we propose a framework named 
Deep Online Performance Evaluation (DOPE) to predict 
students’ course performance in online learning. DOPE first 
models the student course relations of the online system as a 
knowledge graph. To incorporate an aggregated overview of 
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the students and courses in the online system, DOPE learns 
student and course embeddings from our knowledge graph. 
More specifically, we employ a relational graph neural net- 
work that can handle the rich attribute information 
found in our knowledge graph (e.g., student demographic 
data). Then, our proposed approach utilizes a recurrent 
neural network (RNN) to encode the temporal student be- 
havioral data into some features. More specifically, the stu- 
dent behavioral data is coming from student click patterns 
extracted and aggregated into weekly snapshots that repre- 
sent how they have interacted with the online learning sys- 
tem. Finally, the student and course embeddings (extracted 
from the knowledge graph) are combined with the encoded 
behavioral data extracted for the given student and course 
and are fed to a classifier to predict a student’s performance. 
In summary, our contributions are as follows. 


1. We propose the use of a knowledge graph to model com- 
plex online learning environments to allow more rich data 
to be extracted as compared to representing the data in 
a traditional unstructured way; and 


2. Our proposed framework to predict student course out- 
comes contains two novel components, namely a relational 
graph neural network to extract student and course em- 
beddings from the formed knowledge graph and a recur- 
rent neural network model for encoding student behav- 
ioral data according to their clicks in the online system. 


2. PROBLEM STATEMENT 


Suppose from the set of courses in an online system we 
have a subset of m courses denoted as C = {c1, c2, ---, 
Cm}. Furthermore, let there be n students having enrolled 
in at least one of the m courses in C, which we denote as 
S = {s1,52,:-: ,5n}. For each course c;, we assume there 
are some course features that can be represented as the vec- 
tor fj € R“< with d, being the dimension size after encoding 
the course features. Similarly for each of the students s; we 
assume there has been some collected demographic informa- 
tion that can be represented as the vector d; € R?: with d, 
being the dimension size after having encoded the student 
demographic data. In addition to the demographic data, the 
system is assumed to have collected some sequential behav- 
ioral data for each student s; enrolled in course c; that we 
represent as B;; = [Bi,, B2,, ao , BY] where Bj € R® rep- 
resents an encoding of the behavior for student s; during the 
w'" week of course cj, k represents the number of weeks for 
which behavioral data was collected, and q is the dimension 
of the encoded weekly student behavior. In other words, we 
have a tensor of student behavioral data B € R"™™***4, 
For each student s;, we represent their performance out- 
come in course cj as 0;;, where we assume there can be P 
outcomes (denoted by the set p € P). 


Now, given the notations listed above, we seek to learn a 
model f(.|@) having parameters 6 such that it can predict 
the course student outcomes O as follows: 


M(C,8,F,D,B,O, f(.|0)) > 6 


where we use M to denote the machine learning (artificial 
intelligence) process, B is used to represent the behavioral 
(e.g., click) data for a given set of courses C using only the 
first k weeks of data, F represents the set of course features 
of C, D denotes the set of demographic data for the students 
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Figure 2: Visualizing the traditional representation used in 
prior supervised learning prediction models as compared to 
our knowledge graph representation. 
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(a) Student aggregation process (b) Course aggregation process 
Figure 3: Visualizing the aggregation process in how both a 
student and course embedding are formed from their knowl- 
edge graph multi-hop neighborhood. 


in S, O represents the performance outcomes of the students 
in S and the learned parameters of f(.|9) are given by 0. 


3. PROPOSED MODEL 


In this section, we explain our proposed model in detail. 


3.1 Knowledge Graph Representation 

We first model the historical online course data in the form 
of a knowledge graph, as shown in Figure[2| Our knowledge 
graph formulation in Figure [2{b) offers a richer represen- 
tation than a traditional independent naive student course 
relation representation shown in Figure 2{a). This is be- 
cause through this graph structure we can leverage the re- 
lations between students and courses beyond that seen in 
Figure 2[a). We let G = {C,S,Xc,Xs,B,A} represent a 
knowledge graph G containing the set of m course nodes C, 
set of n student nodes S, course features Xe € R"*¢e con- 
structed from F, student demographic features X, € R”*¢: 
constructed from D, the behavioral data B representing 
complex sequential edge features, and an adjacency tensor 
A € R"*™*? constructed from the P different student- 
course outcome relations where A’, = 1 if oj; € O and 
oi; = p (with A? = 0 otherwise). Now, given the knowledge 
graph G, we seek to extract student and course embeddings 
by using a relational graph neural network. 


3.2 Relational Graph Neural Network 
Recently, graph neural networks (GNNs) have be- 
come increasingly popular due to their ability to utilize deep 
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learning on graph structure data. One popular class of 
GNNs is the graph convolutional networks (GCNs) 
[7|, which are constructed with roots from the classical 
CNNs. The general idea of these GCN models is that we 
would like to learn a better set of latent features. In the 
context of our problem, to better understand and represent 
a student, rather than directly using their features alone, we 
could use a 1-layer GCN that would incorporate the features 
of all the courses that the student has taken. For example, 
in Figure la), the 1-hop neighbors would be utilized in a 
1-layer GCN model taking into consideration the course c3 
that they passed and c; that they failed. Then, it is natural 
to see in Figure[3{a) that using a 2-layer GCN would further- 
more incorporate the 2-hop neighbors which would include 
information from all the classmates of s; for each of the two 
courses they have taken, and thus providing further context 
into learning a more comprehensive embedding for student 
8;. We specifically harness the ability of a relational graph 
convolutional network [34]. Next we will provide the details 
on how the first layer (or equivalently a 1-layer) GCN is able 
to construct learned representations hg} and he} for the stu- 
dent s; and course c;, respectively, from the initial student 
features Xs, course features X., and adjacency tensor A in 
our knowledge graph representation. 


—First Layer Embeddings. First, we recall that connec- 
tions between students and courses are stored in the tensor 
A where A?, = 1 if oi; € O and oi; = p (with A?, = 0 oth- 
erwise). Thus, we define for a student s; their set of courses 
for which they had outcome p as V?(s;). Similarly, we de- 
fine for a course c; their set of students that received the 
outcome p as NP (c;). Now, given these new notations, we 
can define the first layer representations hs} and he} for the 
student s; and course c;, respectively, as follows: 


1 
bel = 0(WieyXet DO ae WeXen) 
pep WS ee" cg ENP (s;) 
1 
he} = o (Whey Xe) ye W3Xs(i)) (2) 
We (c;)| 
pEeP siEN2 (cj) 


where o is an element-wise non-linear activation function (e.g., 
ReLU(-) = max(0, -) [13}), Xs,ij denotes the student features for 


1 


Si, Xejj] denotes the course features for cj, W,,, 


f is used to 


transform the self features from the original features, and W1} 
is used for transforming the features that are linked through the 
relation (i.e., course outcome type) p for the first layer. 


—Final Student Embeddings. If we assume having L layers in 
our GCN model, we can then first define the last layer where we 
will obtain the student embedding Ze hs? for s; as follows: 
1 
= L L-1 Ly L-1 
oe o( Wh! pe IWPispl d  Wrhe; ) 
pep © oe "eg ENP (8) 

where hs} represents the representation of student s; at layer | of 
the GCN. Note that if we were to use a 2-layer GCN (i.e., L = 2) 
then hey! = hs; would be coming from Eq. and similarly 
hey? = he; from Eq. (2). 


—Final Course Embeddings. If we assume having L layers in 
our GCN model, we can then first define the last layer where we 
will obtain the course embedding 2 = hey for cj as follows: 
1 
= L L-1 Ly L-l 
Z5 = o( Whayhe oe Weel > Wohs; ) 
peP Se NI" 55ENP(c;) 
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Figure 4: Visualizing the entire DOPE model consisting of 
both the relational graph neural network and recurrent neu- 
ral network components. 


where he! is the embedding of student c; at layer | of the GCN. 


Next, we will discuss how DOPE uses an RNN to encode a stu- 
dent’s sequential behavior data associated with a given course. 


3.3. Encoding Student Behavioral Data 


In this part, we discuss how to encode the sequential edge fea- 
tures i.e., behavioural student data. To recall, when a student 
s; is currently enrolled in a course cj, by the kth week we will 
have the features By; = [Bi,, Bz,,--- , BE). To better repre- 
sent the behavioral data, we utilize a Long-Short Term Memory 
(LSTM) 16], which is an effective RNN variant that has been 
designed to extract temporal features from sequential data e.g., 
videos , speech 225], and text Pa Furthermore, it 
has shown great abilities to capture temporal online user behav- 
iors |26]. We fix the length of the behavior feature sequence for 
all students to be k (e.g., 10 weeks). Then for a given behavioral 
sequential data B;;, at each week ¢t € [1,k], an LSTM unit takes 
the t-th week’s click feature vector Bi. as the input and uses 
LSTM formulation to produce the output behavioral vector 


hf. The final output of the LSTM is ht E Re’ (i.e., output of 
last LSTM unit) when given the sequence B,;; as input. Then, 
we set the encoded behavior of student s; for course c; as the e 


dimensional vector ze. = hf. 


3.4 Final Course Performance Classifier 

Here we combine student and course embeddings from the re- 
lational graph convolutional as well as encoded behavioral data 
and feed into a classifier. This can be seen in Figure [4] Given 
the student embedding z? for student s;, course embedding ZS for 


course cj, encoded student behavior of s; in the course cj; as ze. 
we form the final feature representation as follows: 
aij = 23 ||25||29; 

where || denotes concatenation and we concatenate the three com- 
ponents together into a single (e* + e° +e’) dimensional repre- 
sentation. For training DOPE, we use supervised learning such 
that labels are the outcome performances from the historical data 
oi; € O and matched with the training student and course pair 
(s;,c;). More specifically, we construct a minibatch set M that 
contains triplets of the form (s;,c;,7) where T = 0;; (i.e., the 
course outcome type) and we assume the outcome type set T 
where |7| = p since there are p course outcome types. The ob- 
jective is then formalized in the following: 


f= 1 1 exp (0MEGz,;,) 
“OM cop Papen eS exp OME) 
(s;4,c7,T=0;;)€M onr op! Pig 
Cpe aj T'eT 
AL AReg (RGN gbSTM gMLG) 


(3) 


where the classifier first maps z;; to a p dimensional vector through 
the parameters 9MLG (since we have p different outcomes, i.e., 
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Table 1: The description of the dataset. 


Name | Train Periods | Test Periods | #Students 
SS, 2013J 2014J 735 
SS2 2013B, 2013J 2014B, 2014J 6622 
SS3 2013J, 2014B 2014J 2366 
ST 2013B, 2013J 2014B, 2014J 5745 
ST>2 2013J, 2014B 2014J 2685 
ST3 2013B, 2013J 2014B, 2014J 7092 


link labels in the knowledge graph) and then utilizes the softmax 
function to get the outcome probabilities. 


4. EXPERIMENTS 


In this section, we conduct some experiments to verify the working 
of our proposed method. 


4.1 Dataset and Experimental Settings 

Online education platforms utilize virtual learning environments 
(VLEs) to collect records about all students’ interactions and pro- 
vide the opportunity for analyzing students’ learning behavior. 
In this study, we use the data of The Open University Learn- 
ing Analytics Dataset (OULAD) |29|, which contains 22 open 
university courses for years 2013 and 2014 and 32,593 students. 
The dataset includes student demographic information, student 
assessment results, and daily interactions with the university’s 
VLEs (10,655,280 entries). For each year, courses are offered in 
two distinct modules denoted as B and J (essentially they are 
similar to ‘semester’ in the conventional education system) where 
each module takes around 35 to 40 weeks long. The outcome 
of a course for a student can have four different categories in- 
cluding Distinction, Pass, Fail, and Withdrawn. We use OULAD 
and select three social science courses (i.e., SS1, SS2, and SS3) 
and three Science, Technology, Engineering, and Mathematics 
(STEM) courses (i.e., ST1, ST2, and ST3) as demonstrated in 
Table 


To represent the behavioral data, we count the different number 
of weekly clicks a student makes e.g., accessing resources, web- 
page click, forum click, quiz attempt, and so on. The size of each 
weekly behavioral vector is 20. Further, course attributes include 
two one-hot encoding vectors, one for representing a course among 
6 courses, and the other one for holding either the course is social 
science or STEM. Train and test periods are shown in Table 
We use 10% of the training data as a validation set to tune the 
hyper-parameters. The implementation is done using PyTorch 
package [30]. Each simulation is run for 200 epochs with a learn- 
ing rate set to 0.001 and a decaying rate of 0.99 every 100 steps. 
As for the evaluation metric, we use weighted F1 score which is 
the harmonic mean of recall and precision. 


4.2 Baseline Methods 


We compare the performance of DOPE with the following baseline 
methods. 


e SVM. In this baseline method, we concatenate the course at- 
tributes and students’ demographic features as well as weekly 
click data (i.e., behavioral data) into a single vector and feed it 
to a support vector machine with radial basis function kernel. 


e LR. This is similar to SVM except we use logistic regression for 
classification. The reason for including this baseline is to mea- 
sure the online course performance prediction problem using a 
simple classification method without any kernel or non-linearity. 


e DOPEg¢eon. This is a variation of DOPE where instead of mod- 
eling behavioral data with an LSTM, we use a fully connected 
network. The reason for including this method is to evaluate the 
effectiveness of the way we model sequential behavioral data. 


We compare DOPE with the baseline methods for the different 
numbers of weekly click data i.e., 5, 10, 15, and 20 weeks. By 
doing so, we can measure how effective DOPE is in the early 
prediction of a student’s course performance prediction. We note 
that 20 weeks is almost half of a course period when there is still 
adequate time for intervention in the case of prediction as failure. 


4.3 Binary Classification 

As mentioned begore, our dataset includes 4 distinct labels for a 
student’s performance in a course, namely Distinction, Pass, Fail, 
and Withdrawn. In this section, we merge Distinction and Pass 
into a single class “Pass” and Fail and Withdrawn into a single 
class “Fail” and then perform a binary classification. Figure [5] 
illustrates the experimental results for all courses. We make the 
following observations based on the results presented in Figure[5] 


e In general, the more weekly click data is introduced, the bet- 
ter we can predict the students’ outcomes. DOPE enjoys more 
of such performance increase as compared to other methods. 
In particular, as early as 20 weeks from the start of a course 
(i.e., almost in the middle of a course duration), it can predict 
student’s outcomes with very high performance. This allows 
teachers or online course administration to take actionable and 
interventive measures to help students with poor performance. 


e DOPE achieves a better performance than DOPEgen. This 
shows the fact the LSTM component as a machinery extracting 
temporal features from click behaviors is necessary and affects 
the model’s predictive power. 


e DOPE is shown to be effective for all courses as we can observe 
it achieves an F1 score of more than 0.8 across all courses when 
20 weeks of click data are considered. 


4.4 4-class Classification 

In this part, we compare the performance of DOPE with baseline 
methods for a 4-class classification setting whose experimental 
results are demonstrated in Figure [6] We make the following 
observations based on the results in Figure[6] 


e The observations we made for binary classification hold for 4- 
class classifications as well. In particular, DOPE still outper- 
forms baseline approaches, more weekly click data is helpful in 
course outcome prediction, and the LSTM can effectively han- 
dle sequential that than simple concatenation followed by a fully 
connected network model (i.e., DOPEpon). 


e Since more classes are considered, compared to binary classifi- 
cation, the 4-class classification is a harder task. In particular, 
now Withdrawn is considered as a separate class, which might 
be “conceptually” hard for a model to discern from Fail. 


4.5 Behavioral Feature Analysis 

Since behavioral data (i.e., click data) plays an essential role in de- 
termining a student’s performance, we conduct a feature analysis 
experiment investigating the importance of each behavior type. 
A similar feature analysis has been performed to discover great 
insights into human behaviors [21]. To this end, we follow an ab- 
lation feature analysis where at each time we include one feature 
type and suppress the rest (setting their values to zero) and then 
acquire the F1 score from the model. We do this experiment for 
the binary classification and the case when 20 weeks of click data 
is included. Figure [7] demonstrates the results and we make the 
following observations accordingly. 


e For all courses, feature type homepage is associated with a high 
F1 score. This seems reasonable since most of the click activity 
occurs on the main page of the platform interface. 


e Interestingly, clicks and activities in forums have an influential 
role in predicting fail or pass of a student in a course. This is in 
line with previous [5] where they showed that MOOC forum 


activities correlate with a student’s academic performance. 
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Figure 6: Comparison results for 4-class classification using four different amounts of included weekly click data. 


e Unique behavioral importance profile of a course can aid pol- 
icymakers, administrators and even course interface designers 
to prepare the course materials in a more informed way. For 
instance, we can observe that attribute wiki is playing an im- 
portant role in performance prediction of ST2 while its effect 
is negligible for other courses. This can be indicative of ma- 
terials of the course ST>2 to be requiring more wiki access and 
consequently, the content can be changed accordingly. 


Based on the observations above, we can conclude that DOPE 
encodes behavioral data in an intuitive manner that conforms to 
previous studies’ findings as well. 


4.6 Inter-course Outcome Evaluations 

Naturally, each course has its own model. However, in this sec- 
tion, we intend to measure inter-course performance evaluation 
where we train DOPE on one course and test it on another one. 
Table [2] shows the results. Again, the models are trained for the 
binary classification and they incorporate 20 weeks of the click 
data. Also, for the reference, we have included intra-course per- 
formance (i.e., the same course for training and test) shown in the 
diagonal entries of Table [2] Expectedly, when the training course 
and the test course are the same (i.e., intra-course setting), the 
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performance is higher. This seems reasonable since clicking pat- 
terns are expected for the course in the past (i.e., a part of the 
training data) and the one in the future (i.e., testing data), and 
the model can more easily extract such patterns. Although the 
results for inter-course results are not as good as the ones for 
intra-course, we still see that the DOPE can effectively achieve 
reasonable performance. This indicates that the proposed model 
DOPE can detect salient click and demographic patterns that are 
transferable from a course to another. 


Table 2: Inter-course performance evaluation 


Test course 


Training course 
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Figure 7: Behavioral feature analysis on different courses. 


5. RELATED WORK 


In the following, we highlight some of the works focusing on stu- 
dent dropout and performance prediction. In [35], they extracted 
27 interpretive features and used logistic regression to predict 
student persistence prediction. The authors of used proba- 
bilistic soft logic to model student survival by constructing prob- 
abilistic soft logic rules and associating them. Different from 
(which mainly considered forum features), in they did not 
consider forum data, but instead only made use of clickstream 
data to train their prediction model. More specifically, they used 
principal component analysis paired with a linear support 
vector machine for each wee It was in that a more 
comprehensive approach was taken that used standard classifi- 
cation trees and adaptive boosted trees to construct their 
two-stage Friedman and Nemenyi procedure for dropout predic- 
tion by processing different features such as clickstream-based, 
forum-based, and assignment-based features. More recently, in 
BI; the authors studied a hybrid method for dropout prediction 
by combining both a decision tree and extreme learning ma- 
chine . In addition to these traditional machine learning meth- 
ods, some researchers have tried to use different deep learning 
models for dropout prediction of online courses. In an LSTM 
was used to deal with the features extracted from students’ in- 
teraction with lecture videos, forums, quizzes, and problems. 
explored the potential benefits of employing a fully connected 
feed-forward neural network for dropout prediction. Different 
from previous work, proposed a context-aware feature inter- 
action network to incorporate context information of both par- 
ticipants and courses. More specifically, they used an attention- 
based mechanism for learning activity features. The most similar 
method to ours is found in where they sought to conduct per- 
formance evaluations on students using a graph neural network 
(GNN), but there are primary differences: (1) they constructed 
separate small graphs of courses for each student while DOPE 
constructs a single knowledge graph of historical student course 
relations; (2) their graph neural network was used to obtain a 
graph classification for a given student based on that student’s 
specific course graph, while our method uses the relational graph 
neural network to learn embeddings for both students and courses 
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binary classification using 20 weeks of click data. 


from a single large knowledge graph; and (3) DOPE furthermore 
utilizes an LSTM model to capture a student’s rich sequential 
behavioral data beyond just using static fixed student features. 


6. CONCLUSION AND FUTURE WORK 


In this paper, we proposed a model for course performance predic- 
tion we call it Deep Online Performance Evaluation (DOPE). Our 
method first represents the online learning system as a knowledge 
graph, such that we then learn student and course embeddings 
from historical data using a relational graph neural network. Si- 
multaneously, DOPE utilizes an LSTM for harnessing the student 
behavior data into a condensed encoding, as the data has a nat- 
ural inherent sequential form. We tested the proposed model on 
six courses from the OULAD dataset where the results showed 
the feasibility of DOPE and that it can predict at-risk students 
of on-going courses. We also investigated the usefulness of the 
different types of behavioral features and observed that DOPE 
encodes the data in an intuitive manner. 


In the future, we will first analyze the imbalance and sparse is- 
sues of the dataset. One possible way to alleviate the sparsity 
would be through a network alignment [6] of multiple MOOC 
datasets represented as knowledge graphs or connecting student 
behavior data from social media for better predictions in online 
education |22). Also, we will investigate more advanced ways of 
handling behavioral data. For example, investigating better ways 
to use “subpage” clicks beyond a simple aggregation that ignores 
separating the multiple different “subpages”. In addition, we plan 
to apply our framework to the traditional education system aim- 
ing at identifying similarities and differences between online and 
traditional course performance prediction, since we believe this 
to be highly important in improving online learning systems. 
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